Overview

Dataset statistics

Number of variables23
Number of observations10000
Missing cells8704
Missing cells (%)3.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory539.6 KiB
Average record size in memory55.3 B

Variable types

Numeric8
Categorical15

Warnings

age has a high cardinality: 100 distinct values High cardinality
hhwt is highly correlated with perwtHigh correlation
perwt is highly correlated with hhwtHigh correlation
sample is highly correlated with countryHigh correlation
empstat is highly correlated with empstatdHigh correlation
indig is highly correlated with raceHigh correlation
country is highly correlated with sampleHigh correlation
race is highly correlated with indigHigh correlation
edattain is highly correlated with edattaindHigh correlation
empstatd is highly correlated with empstatHigh correlation
edattaind is highly correlated with edattainHigh correlation
internet has 1801 (18.0%) missing values Missing
race has 4756 (47.6%) missing values Missing
indig has 2147 (21.5%) missing values Missing
df_index has unique values Unique

Reproduction

Analysis started2021-01-11 17:02:54.221857
Analysis finished2021-01-11 17:03:08.978034
Duration14.76 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26064256.58
Minimum14710
Maximum52546277
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB
2021-01-11T09:03:09.086330image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum14710
5-th percentile2609690.45
Q112934202.25
median26004179
Q338974435.25
95-th percentile49747660.85
Maximum52546277
Range52531567
Interquartile range (IQR)26040233

Descriptive statistics

Standard deviation15124513.48
Coefficient of variation (CV)0.5802779538
Kurtosis-1.191586877
Mean26064256.58
Median Absolute Deviation (MAD)13014604
Skewness0.006525841157
Sum2.606425658 × 1011
Variance2.287509079 × 1014
MonotocityNot monotonic
2021-01-11T09:03:09.216742image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
498913271
 
< 0.1%
466240151
 
< 0.1%
252429031
 
< 0.1%
424010651
 
< 0.1%
493806291
 
< 0.1%
427758281
 
< 0.1%
468991911
 
< 0.1%
337556571
 
< 0.1%
456447071
 
< 0.1%
345040741
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
147101
< 0.1%
276841
< 0.1%
299251
< 0.1%
342041
< 0.1%
351601
< 0.1%
ValueCountFrequency (%)
525462771
< 0.1%
525451391
< 0.1%
525353561
< 0.1%
525343161
< 0.1%
525302191
< 0.1%

country
Categorical

HIGH CORRELATION

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size10.6 KiB
brazil
3946 
mexico
2144 
colombia
762 
argentina
761 
peru
519 
Other values (11)
1868 

Length

Max length18
Median length6
Mean length6.7396
Min length4

Characters and Unicode

Total characters67396
Distinct characters22
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowbrazil
2nd rowmexico
3rd rowbrazil
4th rowmexico
5th rowargentina
ValueCountFrequency (%)
brazil3946
39.5%
mexico2144
21.4%
colombia762
 
7.6%
argentina761
 
7.6%
peru519
 
5.2%
venezuela417
 
4.2%
chile307
 
3.1%
ecuador297
 
3.0%
dominican republic172
 
1.7%
haiti145
 
1.5%
Other values (6)530
 
5.3%
2021-01-11T09:03:09.455376image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
brazil3946
38.1%
mexico2144
20.7%
colombia762
 
7.4%
argentina761
 
7.3%
peru519
 
5.0%
venezuela417
 
4.0%
chile307
 
3.0%
ecuador297
 
2.9%
republic172
 
1.7%
dominican172
 
1.7%
Other values (9)865
 
8.3%

Most occurring characters

ValueCountFrequency (%)
i8904
13.2%
a8297
12.3%
r6162
9.1%
l5818
8.6%
e5558
8.2%
b4880
7.2%
o4460
 
6.6%
z4363
 
6.5%
c4115
 
6.1%
m3141
 
4.7%
Other values (12)11698
17.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter67034
99.5%
Space Separator362
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
i8904
13.3%
a8297
12.4%
r6162
9.2%
l5818
8.7%
e5558
8.3%
b4880
7.3%
o4460
 
6.7%
z4363
 
6.5%
c4115
 
6.1%
m3141
 
4.7%
Other values (11)11336
16.9%
ValueCountFrequency (%)
362
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin67034
99.5%
Common362
 
0.5%

Most frequent character per script

ValueCountFrequency (%)
i8904
13.3%
a8297
12.4%
r6162
9.2%
l5818
8.7%
e5558
8.3%
b4880
7.3%
o4460
 
6.7%
z4363
 
6.5%
c4115
 
6.1%
m3141
 
4.7%
Other values (11)11336
16.9%
ValueCountFrequency (%)
362
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII67396
100.0%

Most frequent character per block

ValueCountFrequency (%)
i8904
13.2%
a8297
12.3%
r6162
9.1%
l5818
8.6%
e5558
8.2%
b4880
7.2%
o4460
 
6.6%
z4363
 
6.5%
c4115
 
6.1%
m3141
 
4.7%
Other values (12)11698
17.4%

year
Real number (ℝ≥0)

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2009.6268
Minimum2001
Maximum2015
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB
2021-01-11T09:03:09.544426image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum2001
5-th percentile2001
Q12010
median2010
Q32010
95-th percentile2015
Maximum2015
Range14
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3.882219587
Coefficient of variation (CV)0.001931811213
Kurtosis-0.1044309751
Mean2009.6268
Median Absolute Deviation (MAD)0
Skewness-0.503504689
Sum20096268
Variance15.07162892
MonotocityNot monotonic
2021-01-11T09:03:09.628344image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
20105239
52.4%
20152144
21.4%
2005857
 
8.6%
2007626
 
6.3%
2001550
 
5.5%
2002307
 
3.1%
2003145
 
1.5%
2011132
 
1.3%
ValueCountFrequency (%)
2001550
5.5%
2002307
 
3.1%
2003145
 
1.5%
2005857
8.6%
2007626
6.3%
ValueCountFrequency (%)
20152144
21.4%
2011132
 
1.3%
20105239
52.4%
2007626
 
6.3%
2005857
 
8.6%

sample
Categorical

HIGH CORRELATION

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size10.6 KiB
brazil 2010
3946 
mexico 2015
2144 
colombia 2005
762 
argentina 2010
761 
peru 2007
519 
Other values (11)
1868 

Length

Max length23
Median length11
Mean length11.7396
Min length9

Characters and Unicode

Total characters117396
Distinct characters28
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowbrazil 2010
2nd rowmexico 2015
3rd rowbrazil 2010
4th rowmexico 2015
5th rowargentina 2010
ValueCountFrequency (%)
brazil 20103946
39.5%
mexico 20152144
21.4%
colombia 2005762
 
7.6%
argentina 2010761
 
7.6%
peru 2007519
 
5.2%
venezuela 2001417
 
4.2%
chile 2002307
 
3.1%
ecuador 2010297
 
3.0%
dominican republic 2010172
 
1.7%
haiti 2003145
 
1.5%
Other values (6)530
 
5.3%
2021-01-11T09:03:10.005233image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
20105239
25.7%
brazil3946
19.4%
20152144
10.5%
mexico2144
10.5%
2005857
 
4.2%
colombia762
 
3.7%
argentina761
 
3.7%
2007626
 
3.1%
2001550
 
2.7%
peru519
 
2.5%
Other values (17)2814
13.8%

Most occurring characters

ValueCountFrequency (%)
017724
15.1%
10362
 
8.8%
210307
 
8.8%
i8904
 
7.6%
a8297
 
7.1%
18197
 
7.0%
r6162
 
5.2%
l5818
 
5.0%
e5558
 
4.7%
b4880
 
4.2%
Other values (18)31187
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter67034
57.1%
Decimal Number40000
34.1%
Space Separator10362
 
8.8%

Most frequent character per category

ValueCountFrequency (%)
i8904
13.3%
a8297
12.4%
r6162
9.2%
l5818
8.7%
e5558
8.3%
b4880
7.3%
o4460
 
6.7%
z4363
 
6.5%
c4115
 
6.1%
m3141
 
4.7%
Other values (11)11336
16.9%
ValueCountFrequency (%)
017724
44.3%
210307
25.8%
18197
20.5%
53001
 
7.5%
7626
 
1.6%
3145
 
0.4%
ValueCountFrequency (%)
10362
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin67034
57.1%
Common50362
42.9%

Most frequent character per script

ValueCountFrequency (%)
i8904
13.3%
a8297
12.4%
r6162
9.2%
l5818
8.7%
e5558
8.3%
b4880
7.3%
o4460
 
6.7%
z4363
 
6.5%
c4115
 
6.1%
m3141
 
4.7%
Other values (11)11336
16.9%
ValueCountFrequency (%)
017724
35.2%
10362
20.6%
210307
20.5%
18197
16.3%
53001
 
6.0%
7626
 
1.2%
3145
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII117396
100.0%

Most frequent character per block

ValueCountFrequency (%)
017724
15.1%
10362
 
8.8%
210307
 
8.8%
i8904
 
7.6%
a8297
 
7.1%
18197
 
7.0%
r6162
 
5.2%
l5818
 
5.0%
e5558
 
4.7%
b4880
 
4.2%
Other values (18)31187
26.6%

serial
Real number (ℝ≥0)

Distinct9992
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1608447907
Minimum112001
Maximum6189373000
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB
2021-01-11T09:03:10.113332image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum112001
5-th percentile44133550.05
Q1300191000.2
median893770000
Q32499659000
95-th percentile5341378550
Maximum6189373000
Range6189260999
Interquartile range (IQR)2199468000

Descriptive statistics

Standard deviation1679117758
Coefficient of variation (CV)1.043936674
Kurtosis0.2236980339
Mean1608447907
Median Absolute Deviation (MAD)763431999.5
Skewness1.151049553
Sum1.608447907 × 1013
Variance2.819436445 × 1018
MonotocityNot monotonic
2021-01-11T09:03:10.230802image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10388520002
 
< 0.1%
1795160012
 
< 0.1%
17142420002
 
< 0.1%
4834380012
 
< 0.1%
21884380002
 
< 0.1%
681590002
 
< 0.1%
5946810002
 
< 0.1%
87290002
 
< 0.1%
269100001
 
< 0.1%
1817320011
 
< 0.1%
Other values (9982)9982
99.8%
ValueCountFrequency (%)
1120011
< 0.1%
3190011
< 0.1%
4650001
< 0.1%
4650011
< 0.1%
6320001
< 0.1%
ValueCountFrequency (%)
61893730001
< 0.1%
61892950001
< 0.1%
61878060001
< 0.1%
61876160001
< 0.1%
61869460001
< 0.1%

persons
Real number (ℝ≥0)

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.6389
Minimum1
Maximum28
Zeros0
Zeros (%)0.0%
Memory size9.9 KiB
2021-01-11T09:03:10.334477image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median4
Q36
95-th percentile9
Maximum28
Range27
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.289897629
Coefficient of variation (CV)0.4936294444
Kurtosis7.761745766
Mean4.6389
Median Absolute Deviation (MAD)1
Skewness1.727309938
Sum46389
Variance5.243631153
MonotocityNot monotonic
2021-01-11T09:03:10.427562image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
42303
23.0%
31772
17.7%
51757
17.6%
61165
11.7%
21054
10.5%
7688
 
6.9%
1350
 
3.5%
8345
 
3.5%
9234
 
2.3%
10143
 
1.4%
Other values (15)189
 
1.9%
ValueCountFrequency (%)
1350
 
3.5%
21054
10.5%
31772
17.7%
42303
23.0%
51757
17.6%
ValueCountFrequency (%)
282
< 0.1%
271
< 0.1%
242
< 0.1%
221
< 0.1%
211
< 0.1%

hhwt
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1792
Distinct (%)17.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.731379
Minimum0
Maximum198
Zeros9
Zeros (%)0.1%
Memory size78.2 KiB
2021-01-11T09:03:10.546163image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q14.74
median10
Q310
95-th percentile22.123
Maximum198
Range198
Interquartile range (IQR)5.26

Descriptive statistics

Standard deviation8.743821307
Coefficient of variation (CV)0.8985182169
Kurtosis61.21473892
Mean9.731379
Median Absolute Deviation (MAD)2.46
Skewness5.820674417
Sum97313.79
Variance76.45441105
MonotocityNot monotonic
2021-01-11T09:03:10.671317image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
103258
32.6%
2535
 
5.3%
4522
 
5.2%
6315
 
3.1%
4.64283
 
2.8%
8193
 
1.9%
1271
 
0.7%
1654
 
0.5%
1442
 
0.4%
2233
 
0.3%
Other values (1782)4694
46.9%
ValueCountFrequency (%)
09
0.1%
110
0.1%
1.031
 
< 0.1%
1.041
 
< 0.1%
1.071
 
< 0.1%
ValueCountFrequency (%)
1981
< 0.1%
1481
< 0.1%
127.391
< 0.1%
120.121
< 0.1%
1181
< 0.1%

gq
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size10.1 KiB
households
9948 
other group quarters
 
26
institutions
 
12
1-person unit created by splitting large household
 
9
group quarters (collective), n.s
 
5

Length

Max length50
Median length10
Mean length10.0754
Min length10

Characters and Unicode

Total characters100754
Distinct characters26
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhouseholds
2nd rowhouseholds
3rd rowhouseholds
4th rowhouseholds
5th rowhouseholds
ValueCountFrequency (%)
households9948
99.5%
other group quarters26
 
0.3%
institutions12
 
0.1%
1-person unit created by splitting large household9
 
0.1%
group quarters (collective), n.s5
 
0.1%
2021-01-11T09:03:10.886353image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T09:03:10.957229image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
households9948
98.3%
quarters31
 
0.3%
group31
 
0.3%
other26
 
0.3%
institutions12
 
0.1%
household9
 
0.1%
unit9
 
0.1%
1-person9
 
0.1%
splitting9
 
0.1%
large9
 
0.1%
Other values (4)28
 
0.3%

Most occurring characters

ValueCountFrequency (%)
o19997
19.8%
s19983
19.8%
h19940
19.8%
e10060
10.0%
u10040
10.0%
l9985
9.9%
d9966
9.9%
r146
 
0.1%
t134
 
0.1%
121
 
0.1%
Other values (16)382
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter100595
99.8%
Space Separator121
 
0.1%
Other Punctuation10
 
< 0.1%
Decimal Number9
 
< 0.1%
Dash Punctuation9
 
< 0.1%
Open Punctuation5
 
< 0.1%
Close Punctuation5
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
o19997
19.9%
s19983
19.9%
h19940
19.8%
e10060
10.0%
u10040
10.0%
l9985
9.9%
d9966
9.9%
r146
 
0.1%
t134
 
0.1%
i68
 
0.1%
Other values (9)276
 
0.3%
ValueCountFrequency (%)
,5
50.0%
.5
50.0%
ValueCountFrequency (%)
121
100.0%
ValueCountFrequency (%)
19
100.0%
ValueCountFrequency (%)
-9
100.0%
ValueCountFrequency (%)
(5
100.0%
ValueCountFrequency (%)
)5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin100595
99.8%
Common159
 
0.2%

Most frequent character per script

ValueCountFrequency (%)
o19997
19.9%
s19983
19.9%
h19940
19.8%
e10060
10.0%
u10040
10.0%
l9985
9.9%
d9966
9.9%
r146
 
0.1%
t134
 
0.1%
i68
 
0.1%
Other values (9)276
 
0.3%
ValueCountFrequency (%)
121
76.1%
19
 
5.7%
-9
 
5.7%
(5
 
3.1%
)5
 
3.1%
,5
 
3.1%
.5
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII100754
100.0%

Most frequent character per block

ValueCountFrequency (%)
o19997
19.8%
s19983
19.8%
h19940
19.8%
e10060
10.0%
u10040
10.0%
l9985
9.9%
d9966
9.9%
r146
 
0.1%
t134
 
0.1%
121
 
0.1%
Other values (16)382
 
0.4%

geolev1
Real number (ℝ≥0)

Distinct291
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean257790.0231
Minimum32002
Maximum862023
Zeros0
Zeros (%)0.0%
Memory size39.2 KiB
2021-01-11T09:03:11.063851image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum32002
5-th percentile32034
Q176031
median152132
Q3484014
95-th percentile604021
Maximum862023
Range830021
Interquartile range (IQR)407983

Descriptive statistics

Standard deviation232148.4976
Coefficient of variation (CV)0.9005332897
Kurtosis-0.1431791563
Mean257790.0231
Median Absolute Deviation (MAD)76109
Skewness0.9641215669
Sum2577900231
Variance5.389292492 × 1010
MonotocityNot monotonic
2021-01-11T09:03:11.186663image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
76035720
 
7.2%
76031474
 
4.7%
32006312
 
3.1%
76029282
 
2.8%
76043273
 
2.7%
76041233
 
2.3%
76052203
 
2.0%
484020200
 
2.0%
76033195
 
1.9%
218009192
 
1.9%
Other values (281)6916
69.2%
ValueCountFrequency (%)
3200260
 
0.6%
32006312
3.1%
320105
 
0.1%
3201454
 
0.5%
3201820
 
0.2%
ValueCountFrequency (%)
86202346
0.5%
8620227
 
0.1%
86202112
 
0.1%
86202026
0.3%
86201912
 
0.1%

internet
Categorical

MISSING

Distinct4
Distinct (%)< 0.1%
Missing1801
Missing (%)18.0%
Memory size10.1 KiB
no
3865 
niu (not in universe)
2762 
yes
1531 
unknown
 
41

Length

Max length21
Median length3
Mean length8.612269789
Min length2

Characters and Unicode

Total characters70612
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowniu (not in universe)
2nd rowyes
3rd rowniu (not in universe)
4th rowno
5th rowyes
ValueCountFrequency (%)
no3865
38.6%
niu (not in universe)2762
27.6%
yes1531
 
15.3%
unknown41
 
0.4%
(Missing)1801
18.0%
2021-01-11T09:03:11.401745image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T09:03:11.473849image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
no3865
23.4%
not2762
16.8%
in2762
16.8%
niu2762
16.8%
universe2762
16.8%
yes1531
 
9.3%
unknown41
 
0.2%

Most occurring characters

ValueCountFrequency (%)
n15036
21.3%
i8286
11.7%
8286
11.7%
e7055
10.0%
o6668
9.4%
u5565
 
7.9%
s4293
 
6.1%
(2762
 
3.9%
t2762
 
3.9%
v2762
 
3.9%
Other values (5)7137
10.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter56802
80.4%
Space Separator8286
 
11.7%
Open Punctuation2762
 
3.9%
Close Punctuation2762
 
3.9%

Most frequent character per category

ValueCountFrequency (%)
n15036
26.5%
i8286
14.6%
e7055
12.4%
o6668
11.7%
u5565
 
9.8%
s4293
 
7.6%
t2762
 
4.9%
v2762
 
4.9%
r2762
 
4.9%
y1531
 
2.7%
Other values (2)82
 
0.1%
ValueCountFrequency (%)
8286
100.0%
ValueCountFrequency (%)
(2762
100.0%
ValueCountFrequency (%)
)2762
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin56802
80.4%
Common13810
 
19.6%

Most frequent character per script

ValueCountFrequency (%)
n15036
26.5%
i8286
14.6%
e7055
12.4%
o6668
11.7%
u5565
 
9.8%
s4293
 
7.6%
t2762
 
4.9%
v2762
 
4.9%
r2762
 
4.9%
y1531
 
2.7%
Other values (2)82
 
0.1%
ValueCountFrequency (%)
8286
60.0%
(2762
 
20.0%
)2762
 
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII70612
100.0%

Most frequent character per block

ValueCountFrequency (%)
n15036
21.3%
i8286
11.7%
8286
11.7%
e7055
10.0%
o6668
9.4%
u5565
 
7.9%
s4293
 
6.1%
(2762
 
3.9%
t2762
 
3.9%
v2762
 
3.9%
Other values (5)7137
10.1%

computer
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size10.1 KiB
no
7381 
yes
2544 
niu (not in universe)
 
41
unknown/missing
 
34

Length

Max length21
Median length2
Mean length2.3765
Min length2

Characters and Unicode

Total characters23765
Distinct characters18
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowno
2nd rowno
3rd rowno
4th rowyes
5th rowyes
ValueCountFrequency (%)
no7381
73.8%
yes2544
 
25.4%
niu (not in universe)41
 
0.4%
unknown/missing34
 
0.3%
2021-01-11T09:03:11.663363image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T09:03:11.729986image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
no7381
72.9%
yes2544
 
25.1%
not41
 
0.4%
in41
 
0.4%
niu41
 
0.4%
universe41
 
0.4%
unknown/missing34
 
0.3%

Most occurring characters

ValueCountFrequency (%)
n7681
32.3%
o7456
31.4%
s2653
 
11.2%
e2626
 
11.0%
y2544
 
10.7%
i191
 
0.8%
123
 
0.5%
u116
 
0.5%
(41
 
0.2%
t41
 
0.2%
Other values (8)293
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter23526
99.0%
Space Separator123
 
0.5%
Open Punctuation41
 
0.2%
Close Punctuation41
 
0.2%
Other Punctuation34
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
n7681
32.6%
o7456
31.7%
s2653
 
11.3%
e2626
 
11.2%
y2544
 
10.8%
i191
 
0.8%
u116
 
0.5%
t41
 
0.2%
v41
 
0.2%
r41
 
0.2%
Other values (4)136
 
0.6%
ValueCountFrequency (%)
123
100.0%
ValueCountFrequency (%)
(41
100.0%
ValueCountFrequency (%)
)41
100.0%
ValueCountFrequency (%)
/34
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin23526
99.0%
Common239
 
1.0%

Most frequent character per script

ValueCountFrequency (%)
n7681
32.6%
o7456
31.7%
s2653
 
11.3%
e2626
 
11.2%
y2544
 
10.8%
i191
 
0.8%
u116
 
0.5%
t41
 
0.2%
v41
 
0.2%
r41
 
0.2%
Other values (4)136
 
0.6%
ValueCountFrequency (%)
123
51.5%
(41
 
17.2%
)41
 
17.2%
/34
 
14.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII23765
100.0%

Most frequent character per block

ValueCountFrequency (%)
n7681
32.3%
o7456
31.4%
s2653
 
11.2%
e2626
 
11.0%
y2544
 
10.7%
i191
 
0.8%
123
 
0.5%
u116
 
0.5%
(41
 
0.2%
t41
 
0.2%
Other values (8)293
 
1.2%

pernum
Real number (ℝ≥0)

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.8293
Minimum1
Maximum22
Zeros0
Zeros (%)0.0%
Memory size9.9 KiB
2021-01-11T09:03:11.807404image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile6
Maximum22
Range21
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.846917058
Coefficient of variation (CV)0.6527823343
Kurtosis4.22190227
Mean2.8293
Median Absolute Deviation (MAD)1
Skewness1.534156834
Sum28293
Variance3.41110262
MonotocityNot monotonic
2021-01-11T09:03:11.889908image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
12771
27.7%
22442
24.4%
31885
18.9%
41325
13.2%
5732
 
7.3%
6401
 
4.0%
7209
 
2.1%
8114
 
1.1%
956
 
0.6%
1029
 
0.3%
Other values (7)36
 
0.4%
ValueCountFrequency (%)
12771
27.7%
22442
24.4%
31885
18.9%
41325
13.2%
5732
 
7.3%
ValueCountFrequency (%)
221
 
< 0.1%
161
 
< 0.1%
153
< 0.1%
143
< 0.1%
134
< 0.1%

perwt
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1790
Distinct (%)17.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.741712
Minimum1
Maximum198
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB
2021-01-11T09:03:11.991909image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14.79
median10
Q310
95-th percentile22.123
Maximum198
Range197
Interquartile range (IQR)5.21

Descriptive statistics

Standard deviation8.738742653
Coefficient of variation (CV)0.897043831
Kurtosis61.33514744
Mean9.741712
Median Absolute Deviation (MAD)2.435
Skewness5.828609816
Sum97417.12
Variance76.36562315
MonotocityNot monotonic
2021-01-11T09:03:12.105973image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
103267
32.7%
2535
 
5.3%
4522
 
5.2%
6315
 
3.1%
4.64283
 
2.8%
8193
 
1.9%
1271
 
0.7%
1654
 
0.5%
1442
 
0.4%
2233
 
0.3%
Other values (1780)4685
46.9%
ValueCountFrequency (%)
110
0.1%
1.031
 
< 0.1%
1.041
 
< 0.1%
1.071
 
< 0.1%
1.081
 
< 0.1%
ValueCountFrequency (%)
1981
< 0.1%
1481
< 0.1%
127.391
< 0.1%
120.121
< 0.1%
1181
< 0.1%

age
Categorical

HIGH CARDINALITY

Distinct100
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size15.7 KiB
13
 
211
5
 
208
12
 
208
15
 
202
20
 
201
Other values (95)
8970 

Length

Max length20
Median length2
Mean length2.2885
Min length1

Characters and Unicode

Total characters22885
Distinct characters28
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row25
2nd row13
3rd row3
4th row42
5th row57
ValueCountFrequency (%)
13211
 
2.1%
5208
 
2.1%
12208
 
2.1%
15202
 
2.0%
20201
 
2.0%
3200
 
2.0%
14195
 
1.9%
11194
 
1.9%
18193
 
1.9%
9192
 
1.9%
Other values (90)7996
80.0%
2021-01-11T09:03:12.343713image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1365
 
3.3%
year365
 
3.3%
13211
 
1.9%
12208
 
1.9%
5208
 
1.9%
15202
 
1.9%
20201
 
1.8%
3200
 
1.8%
14195
 
1.8%
11194
 
1.8%
Other values (94)8562
78.5%

Most occurring characters

ValueCountFrequency (%)
13120
13.6%
22720
11.9%
32494
10.9%
42133
9.3%
51947
8.5%
61501
 
6.6%
71277
 
5.6%
81091
 
4.8%
0983
 
4.3%
911
 
4.0%
Other values (18)4708
20.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number18152
79.3%
Lowercase Letter3819
 
16.7%
Space Separator911
 
4.0%
Math Symbol2
 
< 0.1%
Other Punctuation1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e720
18.9%
a718
18.8%
s547
14.3%
r528
13.8%
y526
13.8%
t194
 
5.1%
n194
 
5.1%
l192
 
5.0%
h192
 
5.0%
o2
 
0.1%
Other values (5)6
 
0.2%
ValueCountFrequency (%)
13120
17.2%
22720
15.0%
32494
13.7%
42133
11.8%
51947
10.7%
61501
8.3%
71277
7.0%
81091
 
6.0%
0983
 
5.4%
9886
 
4.9%
ValueCountFrequency (%)
911
100.0%
ValueCountFrequency (%)
/1
100.0%
ValueCountFrequency (%)
+2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common19066
83.3%
Latin3819
 
16.7%

Most frequent character per script

ValueCountFrequency (%)
e720
18.9%
a718
18.8%
s547
14.3%
r528
13.8%
y526
13.8%
t194
 
5.1%
n194
 
5.1%
l192
 
5.0%
h192
 
5.0%
o2
 
0.1%
Other values (5)6
 
0.2%
ValueCountFrequency (%)
13120
16.4%
22720
14.3%
32494
13.1%
42133
11.2%
51947
10.2%
61501
7.9%
71277
6.7%
81091
 
5.7%
0983
 
5.2%
911
 
4.8%
Other values (3)889
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII22885
100.0%

Most frequent character per block

ValueCountFrequency (%)
13120
13.6%
22720
11.9%
32494
10.9%
42133
9.3%
51947
8.5%
61501
 
6.6%
71277
 
5.6%
81091
 
4.8%
0983
 
4.3%
911
 
4.0%
Other values (18)4708
20.6%

sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size10.0 KiB
male
5020 
female
4980 

Length

Max length6
Median length4
Mean length4.996
Min length4

Characters and Unicode

Total characters49960
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale
ValueCountFrequency (%)
male5020
50.2%
female4980
49.8%
2021-01-11T09:03:12.545945image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T09:03:12.614353image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
male5020
50.2%
female4980
49.8%

Most occurring characters

ValueCountFrequency (%)
e14980
30.0%
m10000
20.0%
a10000
20.0%
l10000
20.0%
f4980
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter49960
100.0%

Most frequent character per category

ValueCountFrequency (%)
e14980
30.0%
m10000
20.0%
a10000
20.0%
l10000
20.0%
f4980
 
10.0%

Most occurring scripts

ValueCountFrequency (%)
Latin49960
100.0%

Most frequent character per script

ValueCountFrequency (%)
e14980
30.0%
m10000
20.0%
a10000
20.0%
l10000
20.0%
f4980
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII49960
100.0%

Most frequent character per block

ValueCountFrequency (%)
e14980
30.0%
m10000
20.0%
a10000
20.0%
l10000
20.0%
f4980
 
10.0%

race
Categorical

HIGH CORRELATION
MISSING

Distinct12
Distinct (%)0.2%
Missing4756
Missing (%)47.6%
Memory size10.6 KiB
white
2632 
brown (brazil)
1742 
black
364 
mestizo (indigenous and white)
313 
indigenous
 
79
Other values (7)
 
114

Length

Max length30
Median length5
Mean length9.686308162
Min length5

Characters and Unicode

Total characters50795
Distinct characters24
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowwhite
2nd rowwhite
3rd rowwhite
4th rowwhite
5th rowwhite
ValueCountFrequency (%)
white2632
26.3%
brown (brazil)1742
 
17.4%
black364
 
3.6%
mestizo (indigenous and white)313
 
3.1%
indigenous79
 
0.8%
asian39
 
0.4%
montubio (ecuador)29
 
0.3%
unknown23
 
0.2%
afro-ecuadorian11
 
0.1%
mulatto (black and white)6
 
0.1%
Other values (2)6
 
0.1%
(Missing)4756
47.6%
2021-01-11T09:03:12.797945image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
white2951
37.0%
brown1742
21.8%
brazil1742
21.8%
indigenous392
 
4.9%
black370
 
4.6%
and319
 
4.0%
mestizo313
 
3.9%
asian39
 
0.5%
ecuador29
 
0.4%
montubio29
 
0.4%
Other values (8)52
 
0.7%

Most occurring characters

ValueCountFrequency (%)
i5869
11.6%
w4718
 
9.3%
b3883
 
7.6%
e3704
 
7.3%
r3545
 
7.0%
t3311
 
6.5%
n2993
 
5.9%
h2955
 
5.8%
2734
 
5.4%
o2595
 
5.1%
Other values (14)14488
28.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter43870
86.4%
Space Separator2734
 
5.4%
Open Punctuation2090
 
4.1%
Close Punctuation2090
 
4.1%
Dash Punctuation11
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
i5869
13.4%
w4718
10.8%
b3883
8.9%
e3704
8.4%
r3545
8.1%
t3311
7.5%
n2993
 
6.8%
h2955
 
6.7%
o2595
 
5.9%
a2579
 
5.9%
Other values (10)7718
17.6%
ValueCountFrequency (%)
2734
100.0%
ValueCountFrequency (%)
(2090
100.0%
ValueCountFrequency (%)
)2090
100.0%
ValueCountFrequency (%)
-11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin43870
86.4%
Common6925
 
13.6%

Most frequent character per script

ValueCountFrequency (%)
i5869
13.4%
w4718
10.8%
b3883
8.9%
e3704
8.4%
r3545
8.1%
t3311
7.5%
n2993
 
6.8%
h2955
 
6.7%
o2595
 
5.9%
a2579
 
5.9%
Other values (10)7718
17.6%
ValueCountFrequency (%)
2734
39.5%
(2090
30.2%
)2090
30.2%
-11
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII50795
100.0%

Most frequent character per block

ValueCountFrequency (%)
i5869
11.6%
w4718
 
9.3%
b3883
 
7.6%
e3704
 
7.3%
r3545
 
7.0%
t3311
 
6.5%
n2993
 
5.9%
h2955
 
5.8%
2734
 
5.4%
o2595
 
5.1%
Other values (14)14488
28.5%

indig
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing2147
Missing (%)21.5%
Memory size10.0 KiB
no
6945 
yes
865 
unknown
 
43

Length

Max length7
Median length2
Mean length2.13752706
Min length2

Characters and Unicode

Total characters16786
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowno
2nd rowno
3rd rowno
4th rowyes
5th rowno
ValueCountFrequency (%)
no6945
69.5%
yes865
 
8.6%
unknown43
 
0.4%
(Missing)2147
 
21.5%
2021-01-11T09:03:13.008168image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T09:03:13.072214image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
no6945
88.4%
yes865
 
11.0%
unknown43
 
0.5%

Most occurring characters

ValueCountFrequency (%)
n7074
42.1%
o6988
41.6%
y865
 
5.2%
e865
 
5.2%
s865
 
5.2%
u43
 
0.3%
k43
 
0.3%
w43
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter16786
100.0%

Most frequent character per category

ValueCountFrequency (%)
n7074
42.1%
o6988
41.6%
y865
 
5.2%
e865
 
5.2%
s865
 
5.2%
u43
 
0.3%
k43
 
0.3%
w43
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Latin16786
100.0%

Most frequent character per script

ValueCountFrequency (%)
n7074
42.1%
o6988
41.6%
y865
 
5.2%
e865
 
5.2%
s865
 
5.2%
u43
 
0.3%
k43
 
0.3%
w43
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII16786
100.0%

Most frequent character per block

ValueCountFrequency (%)
n7074
42.1%
o6988
41.6%
y865
 
5.2%
e865
 
5.2%
s865
 
5.2%
u43
 
0.3%
k43
 
0.3%
w43
 
0.3%

lit
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size10.1 KiB
yes, literate
7916 
no, illiterate
1232 
niu (not in universe)
806 
unknown/missing
 
46

Length

Max length21
Median length13
Mean length13.7772
Min length13

Characters and Unicode

Total characters137772
Distinct characters21
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowyes, literate
2nd rowyes, literate
3rd rowniu (not in universe)
4th rowyes, literate
5th rowyes, literate
ValueCountFrequency (%)
yes, literate7916
79.2%
no, illiterate1232
 
12.3%
niu (not in universe)806
 
8.1%
unknown/missing46
 
0.5%
2021-01-11T09:03:13.255355image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T09:03:13.324992image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
yes7916
36.7%
literate7916
36.7%
no1232
 
5.7%
illiterate1232
 
5.7%
niu806
 
3.7%
universe806
 
3.7%
not806
 
3.7%
in806
 
3.7%
unknown/missing46
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e27824
20.2%
t19102
13.9%
i12890
9.4%
11566
8.4%
l10380
 
7.5%
r9954
 
7.2%
,9148
 
6.6%
a9148
 
6.6%
s8814
 
6.4%
y7916
 
5.7%
Other values (11)11030
 
8.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter115400
83.8%
Space Separator11566
 
8.4%
Other Punctuation9194
 
6.7%
Open Punctuation806
 
0.6%
Close Punctuation806
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
e27824
24.1%
t19102
16.6%
i12890
11.2%
l10380
 
9.0%
r9954
 
8.6%
a9148
 
7.9%
s8814
 
7.6%
y7916
 
6.9%
n4640
 
4.0%
o2084
 
1.8%
Other values (6)2648
 
2.3%
ValueCountFrequency (%)
,9148
99.5%
/46
 
0.5%
ValueCountFrequency (%)
11566
100.0%
ValueCountFrequency (%)
(806
100.0%
ValueCountFrequency (%)
)806
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin115400
83.8%
Common22372
 
16.2%

Most frequent character per script

ValueCountFrequency (%)
e27824
24.1%
t19102
16.6%
i12890
11.2%
l10380
 
9.0%
r9954
 
8.6%
a9148
 
7.9%
s8814
 
7.6%
y7916
 
6.9%
n4640
 
4.0%
o2084
 
1.8%
Other values (6)2648
 
2.3%
ValueCountFrequency (%)
11566
51.7%
,9148
40.9%
(806
 
3.6%
)806
 
3.6%
/46
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII137772
100.0%

Most frequent character per block

ValueCountFrequency (%)
e27824
20.2%
t19102
13.9%
i12890
9.4%
11566
8.4%
l10380
 
7.5%
r9954
 
7.2%
,9148
 
6.6%
a9148
 
6.6%
s8814
 
6.4%
y7916
 
5.7%
Other values (11)11030
 
8.0%

edattain
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size10.1 KiB
less than primary completed
4338 
primary completed
3061 
secondary completed
1701 
university completed
480 
niu (not in universe)
 
374

Length

Max length27
Median length20
Mean length21.9258
Min length7

Characters and Unicode

Total characters219258
Distinct characters22
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsecondary completed
2nd rowprimary completed
3rd rowless than primary completed
4th rowprimary completed
5th rowprimary completed
ValueCountFrequency (%)
less than primary completed4338
43.4%
primary completed3061
30.6%
secondary completed1701
 
17.0%
university completed480
 
4.8%
niu (not in universe)374
 
3.7%
unknown46
 
0.5%
2021-01-11T09:03:13.724191image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T09:03:13.805353image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
completed9580
32.6%
primary7399
25.2%
less4338
14.8%
than4338
14.8%
secondary1701
 
5.8%
university480
 
1.6%
niu374
 
1.3%
universe374
 
1.3%
not374
 
1.3%
in374
 
1.3%

Most occurring characters

ValueCountFrequency (%)
e26427
12.1%
19378
 
8.8%
r17353
 
7.9%
m16979
 
7.7%
p16979
 
7.7%
t14772
 
6.7%
l13918
 
6.3%
a13438
 
6.1%
o11701
 
5.3%
c11281
 
5.1%
Other values (12)57032
26.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter199132
90.8%
Space Separator19378
 
8.8%
Open Punctuation374
 
0.2%
Close Punctuation374
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
e26427
13.3%
r17353
 
8.7%
m16979
 
8.5%
p16979
 
8.5%
t14772
 
7.4%
l13918
 
7.0%
a13438
 
6.7%
o11701
 
5.9%
c11281
 
5.7%
d11281
 
5.7%
Other values (9)45003
22.6%
ValueCountFrequency (%)
19378
100.0%
ValueCountFrequency (%)
(374
100.0%
ValueCountFrequency (%)
)374
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin199132
90.8%
Common20126
 
9.2%

Most frequent character per script

ValueCountFrequency (%)
e26427
13.3%
r17353
 
8.7%
m16979
 
8.5%
p16979
 
8.5%
t14772
 
7.4%
l13918
 
7.0%
a13438
 
6.7%
o11701
 
5.9%
c11281
 
5.7%
d11281
 
5.7%
Other values (9)45003
22.6%
ValueCountFrequency (%)
19378
96.3%
(374
 
1.9%
)374
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII219258
100.0%

Most frequent character per block

ValueCountFrequency (%)
e26427
12.1%
19378
 
8.8%
r17353
 
7.9%
m16979
 
7.7%
p16979
 
7.7%
t14772
 
6.7%
l13918
 
6.3%
a13438
 
6.1%
o11701
 
5.3%
c11281
 
5.1%
Other values (12)57032
26.0%

edattaind
Categorical

HIGH CORRELATION

Distinct14
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size10.6 KiB
some primary completed
2245 
primary (6 yrs) completed
1791 
no schooling
1615 
secondary, general track completed
1120 
lower secondary general completed
1093 
Other values (9)
2136 

Length

Max length36
Median length22
Mean length23.8095
Min length12

Characters and Unicode

Total characters238095
Distinct characters29
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsecondary, general track completed
2nd rowprimary (6 yrs) completed
3rd rowno schooling
4th rowlower secondary general completed
5th rowprimary (6 yrs) completed
ValueCountFrequency (%)
some primary completed2245
22.4%
primary (6 yrs) completed1791
17.9%
no schooling1615
16.2%
secondary, general track completed1120
11.2%
lower secondary general completed1093
10.9%
university completed480
 
4.8%
primary (4 yrs) completed478
 
4.8%
niu (not in universe)374
 
3.7%
some college completed347
 
3.5%
post-secondary technical education193
 
1.9%
Other values (4)264
 
2.6%
2021-01-11T09:03:14.020413image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
completed7772
23.6%
primary4670
14.2%
some2592
 
7.9%
yrs2425
 
7.4%
secondary2275
 
6.9%
general2213
 
6.7%
61791
 
5.4%
no1615
 
4.9%
schooling1615
 
4.9%
track1161
 
3.5%
Other values (13)4758
14.5%

Most occurring characters

ValueCountFrequency (%)
e28514
12.0%
22887
 
9.6%
o19944
 
8.4%
r19575
 
8.2%
m15080
 
6.3%
c14066
 
5.9%
l13663
 
5.7%
p12635
 
5.3%
a10960
 
4.6%
n10519
 
4.4%
Other values (19)70252
29.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter205785
86.4%
Space Separator22887
 
9.6%
Open Punctuation2799
 
1.2%
Close Punctuation2799
 
1.2%
Decimal Number2425
 
1.0%
Other Punctuation1207
 
0.5%
Dash Punctuation193
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
e28514
13.9%
o19944
 
9.7%
r19575
 
9.5%
m15080
 
7.3%
c14066
 
6.8%
l13663
 
6.6%
p12635
 
6.1%
a10960
 
5.3%
n10519
 
5.1%
d10433
 
5.1%
Other values (10)50396
24.5%
ValueCountFrequency (%)
61791
73.9%
4478
 
19.7%
5156
 
6.4%
ValueCountFrequency (%)
,1161
96.2%
/46
 
3.8%
ValueCountFrequency (%)
22887
100.0%
ValueCountFrequency (%)
(2799
100.0%
ValueCountFrequency (%)
)2799
100.0%
ValueCountFrequency (%)
-193
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin205785
86.4%
Common32310
 
13.6%

Most frequent character per script

ValueCountFrequency (%)
e28514
13.9%
o19944
 
9.7%
r19575
 
9.5%
m15080
 
7.3%
c14066
 
6.8%
l13663
 
6.6%
p12635
 
6.1%
a10960
 
5.3%
n10519
 
5.1%
d10433
 
5.1%
Other values (10)50396
24.5%
ValueCountFrequency (%)
22887
70.8%
(2799
 
8.7%
)2799
 
8.7%
61791
 
5.5%
,1161
 
3.6%
4478
 
1.5%
-193
 
0.6%
5156
 
0.5%
/46
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII238095
100.0%

Most frequent character per block

ValueCountFrequency (%)
e28514
12.0%
22887
 
9.6%
o19944
 
8.4%
r19575
 
8.2%
m15080
 
6.3%
c14066
 
5.9%
l13663
 
5.7%
p12635
 
5.3%
a10960
 
4.6%
n10519
 
4.4%
Other values (19)70252
29.5%

empstat
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size10.1 KiB
inactive
3982 
employed
3785 
niu (not in universe)
1828 
unemployed
 
366
unknown/missing
 
39

Length

Max length21
Median length8
Mean length10.4769
Min length8

Characters and Unicode

Total characters104769
Distinct characters23
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowemployed
2nd rowinactive
3rd rowniu (not in universe)
4th rowemployed
5th rowemployed
ValueCountFrequency (%)
inactive3982
39.8%
employed3785
37.9%
niu (not in universe)1828
18.3%
unemployed366
 
3.7%
unknown/missing39
 
0.4%
2021-01-11T09:03:14.229452image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T09:03:14.302306image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
inactive3982
25.7%
employed3785
24.4%
not1828
11.8%
in1828
11.8%
niu1828
11.8%
universe1828
11.8%
unemployed366
 
2.4%
unknown/missing39
 
0.3%

Most occurring characters

ValueCountFrequency (%)
e15940
15.2%
i13526
12.9%
n11816
11.3%
o6018
 
5.7%
t5810
 
5.5%
v5810
 
5.5%
5484
 
5.2%
m4190
 
4.0%
p4151
 
4.0%
l4151
 
4.0%
Other values (13)27873
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter95590
91.2%
Space Separator5484
 
5.2%
Open Punctuation1828
 
1.7%
Close Punctuation1828
 
1.7%
Other Punctuation39
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e15940
16.7%
i13526
14.2%
n11816
12.4%
o6018
 
6.3%
t5810
 
6.1%
v5810
 
6.1%
m4190
 
4.4%
p4151
 
4.3%
l4151
 
4.3%
y4151
 
4.3%
Other values (9)20027
21.0%
ValueCountFrequency (%)
5484
100.0%
ValueCountFrequency (%)
(1828
100.0%
ValueCountFrequency (%)
)1828
100.0%
ValueCountFrequency (%)
/39
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin95590
91.2%
Common9179
 
8.8%

Most frequent character per script

ValueCountFrequency (%)
e15940
16.7%
i13526
14.2%
n11816
12.4%
o6018
 
6.3%
t5810
 
6.1%
v5810
 
6.1%
m4190
 
4.4%
p4151
 
4.3%
l4151
 
4.3%
y4151
 
4.3%
Other values (9)20027
21.0%
ValueCountFrequency (%)
5484
59.7%
(1828
 
19.9%
)1828
 
19.9%
/39
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII104769
100.0%

Most frequent character per block

ValueCountFrequency (%)
e15940
15.2%
i13526
12.9%
n11816
11.3%
o6018
 
5.7%
t5810
 
5.5%
v5810
 
5.5%
5484
 
5.2%
m4190
 
4.0%
p4151
 
4.0%
l4151
 
4.0%
Other values (13)27873
26.6%

empstatd
Categorical

HIGH CORRELATION

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size11.3 KiB
at work
3542 
niu (not in universe)
1828 
inactive (not in labor force)
1713 
housework
947 
in school
819 
Other values (19)
1151 

Length

Max length42
Median length9
Mean length15.7419
Min length7

Characters and Unicode

Total characters157419
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowat work
2nd rowin school
3rd rowniu (not in universe)
4th rowat work
5th rowat work
ValueCountFrequency (%)
at work3542
35.4%
niu (not in universe)1828
18.3%
inactive (not in labor force)1713
17.1%
housework947
 
9.5%
in school819
 
8.2%
inactive, other reasons339
 
3.4%
unemployed, not specified277
 
2.8%
have job, not at work in reference period84
 
0.8%
employed, not specified79
 
0.8%
permanent disability64
 
0.6%
Other values (14)308
 
3.1%
2021-01-11T09:03:14.527461image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
in4444
15.1%
not3986
13.6%
work3669
12.5%
at3647
12.4%
inactive2055
7.0%
universe1828
6.2%
niu1828
6.2%
force1713
 
5.8%
labor1713
 
5.8%
housework947
 
3.2%
Other values (36)3536
12.0%

Most occurring characters

ValueCountFrequency (%)
19366
12.3%
o16231
10.3%
n15670
 
10.0%
i13686
 
8.7%
e12118
 
7.7%
r11434
 
7.3%
t10335
 
6.6%
a8269
 
5.3%
c5097
 
3.2%
u5050
 
3.2%
Other values (19)40163
25.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter129989
82.6%
Space Separator19366
 
12.3%
Open Punctuation3541
 
2.2%
Close Punctuation3541
 
2.2%
Other Punctuation982
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
o16231
12.5%
n15670
12.1%
i13686
10.5%
e12118
9.3%
r11434
8.8%
t10335
 
8.0%
a8269
 
6.4%
c5097
 
3.9%
u5050
 
3.9%
s4955
 
3.8%
Other values (14)27144
20.9%
ValueCountFrequency (%)
,917
93.4%
/65
 
6.6%
ValueCountFrequency (%)
19366
100.0%
ValueCountFrequency (%)
(3541
100.0%
ValueCountFrequency (%)
)3541
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin129989
82.6%
Common27430
 
17.4%

Most frequent character per script

ValueCountFrequency (%)
o16231
12.5%
n15670
12.1%
i13686
10.5%
e12118
9.3%
r11434
8.8%
t10335
 
8.0%
a8269
 
6.4%
c5097
 
3.9%
u5050
 
3.9%
s4955
 
3.8%
Other values (14)27144
20.9%
ValueCountFrequency (%)
19366
70.6%
(3541
 
12.9%
)3541
 
12.9%
,917
 
3.3%
/65
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII157419
100.0%

Most frequent character per block

ValueCountFrequency (%)
19366
12.3%
o16231
10.3%
n15670
 
10.0%
i13686
 
8.7%
e12118
 
7.7%
r11434
 
7.3%
t10335
 
6.6%
a8269
 
5.3%
c5097
 
3.2%
u5050
 
3.2%
Other values (19)40163
25.5%

labforce
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size10.1 KiB
yes, in the labor force
4068 
no, not in the labor force
3060 
niu (not in universe)
2841 
unknown
 
31

Length

Max length26
Median length23
Mean length23.3002
Min length7

Characters and Unicode

Total characters233002
Distinct characters22
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowyes, in the labor force
2nd rowniu (not in universe)
3rd rowniu (not in universe)
4th rowyes, in the labor force
5th rowyes, in the labor force
ValueCountFrequency (%)
yes, in the labor force4068
40.7%
no, not in the labor force3060
30.6%
niu (not in universe)2841
28.4%
unknown31
 
0.3%
2021-01-11T09:03:14.772906image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Histogram of lengths of the category
2021-01-11T09:03:14.844675image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
ValueCountFrequency (%)
in9969
19.9%
labor7128
14.2%
the7128
14.2%
force7128
14.2%
not5901
11.8%
yes4068
8.1%
no3060
 
6.1%
niu2841
 
5.7%
universe2841
 
5.7%
unknown31
 
0.1%

Most occurring characters

ValueCountFrequency (%)
40095
17.2%
n24705
10.6%
e24006
10.3%
o23248
10.0%
r17097
 
7.3%
i15651
 
6.7%
t13029
 
5.6%
,7128
 
3.1%
h7128
 
3.1%
l7128
 
3.1%
Other values (12)53787
23.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter180097
77.3%
Space Separator40095
 
17.2%
Other Punctuation7128
 
3.1%
Open Punctuation2841
 
1.2%
Close Punctuation2841
 
1.2%

Most frequent character per category

ValueCountFrequency (%)
n24705
13.7%
e24006
13.3%
o23248
12.9%
r17097
9.5%
i15651
8.7%
t13029
 
7.2%
h7128
 
4.0%
l7128
 
4.0%
a7128
 
4.0%
b7128
 
4.0%
Other values (8)33849
18.8%
ValueCountFrequency (%)
,7128
100.0%
ValueCountFrequency (%)
40095
100.0%
ValueCountFrequency (%)
(2841
100.0%
ValueCountFrequency (%)
)2841
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin180097
77.3%
Common52905
 
22.7%

Most frequent character per script

ValueCountFrequency (%)
n24705
13.7%
e24006
13.3%
o23248
12.9%
r17097
9.5%
i15651
8.7%
t13029
 
7.2%
h7128
 
4.0%
l7128
 
4.0%
a7128
 
4.0%
b7128
 
4.0%
Other values (8)33849
18.8%
ValueCountFrequency (%)
40095
75.8%
,7128
 
13.5%
(2841
 
5.4%
)2841
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII233002
100.0%

Most frequent character per block

ValueCountFrequency (%)
40095
17.2%
n24705
10.6%
e24006
10.3%
o23248
10.0%
r17097
 
7.3%
i15651
 
6.7%
t13029
 
5.6%
,7128
 
3.1%
h7128
 
3.1%
l7128
 
3.1%
Other values (12)53787
23.1%

Interactions

2021-01-11T09:03:02.077310image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:02.173649image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:02.285287image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:02.399973image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:02.505937image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:02.602626image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:02.696926image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:02.803040image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:02.879073image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:02.958181image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.032124image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.117849image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.207749image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.294847image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.397467image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.509328image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.604710image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.708266image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.827908image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:03.951259image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:04.061308image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:04.183272image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:04.307526image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:04.404571image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:04.510306image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:04.614388image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:04.707241image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:04.799974image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:04.901832image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:05.011946image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:05.104789image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:05.211496image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:05.321129image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:05.540962image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:05.650836image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:05.789574image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:05.898939image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:05.978692image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.076854image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.171070image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.272252image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.368264image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.492766image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.599159image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.690742image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.797514image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.889267image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:06.984689image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:07.074882image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:07.178723image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:07.289542image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:07.382130image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:07.490272image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:07.594576image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:07.709031image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
2021-01-11T09:03:07.813417image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Correlations

2021-01-11T09:03:14.924376image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-01-11T09:03:15.053039image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-01-11T09:03:15.202057image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-01-11T09:03:15.385236image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-01-11T09:03:15.610491image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-01-11T09:03:08.040620image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-01-11T09:03:08.499551image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-01-11T09:03:08.708849image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-01-11T09:03:08.829615image/svg+xmlMatplotlib v3.3.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexcountryyearsampleserialpersonshhwtgqgeolev1internetcomputerpernumperwtagesexraceindiglitedattainedattaindempstatempstatdlabforce
021899551brazil2010brazil 20105.311042e+0922.35households76043niu (not in universe)no12.3525malewhitenoyes, literatesecondary completedsecondary, general track completedemployedat workyes, in the labor force
138283085mexico2015mexico 20159.875930e+08450.00households484014yesno250.0013femaleNaNnoyes, literateprimary completedprimary (6 yrs) completedinactivein schoolniu (not in universe)
27388087brazil2010brazil 20108.805020e+08410.47households76023niu (not in universe)no210.473femalewhitenoniu (not in universe)less than primary completedno schoolingniu (not in universe)niu (not in universe)niu (not in universe)
340775034mexico2015mexico 20151.632011e+0952.00households484020noyes52.0042femaleNaNyesyes, literateprimary completedlower secondary general completedemployedat workyes, in the labor force
41064103argentina2010argentina 20103.530430e+08310.00households32006NaNyes110.0057maleNaNNaNyes, literateprimary completedprimary (6 yrs) completedemployedat workyes, in the labor force
521924791brazil2010brazil 20105.319560e+0933.86households76043yesyes33.8676femalewhitenoyes, literateprimary completedprimary (6 yrs) completedinactiveinactive (not in labor force)no, not in the labor force
651705152venezuela2001venezuela 20014.073540e+08210.00households862015nono210.0030maleNaNNaNyes, literateprimary completedlower secondary general completedemployedat workyes, in the labor force
720333973brazil2010brazil 20104.811139e+0923.05households76041niu (not in universe)no23.0570femalewhitenono, illiterateless than primary completedno schoolinginactiveinactive (not in labor force)no, not in the labor force
845447027mexico2015mexico 20152.851823e+09136.00households484031nono36.0028femaleNaNyesyes, literateprimary completedprimary (6 yrs) completedinactivehouseworkno, not in the labor force
952338114venezuela2001venezuela 20015.835450e+08710.00households862023nono610.0018maleNaNNaNyes, literatesecondary completedsecondary, general track completedunemployedunemployed, new workeryes, in the labor force

Last rows

df_indexcountryyearsampleserialpersonshhwtgqgeolev1internetcomputerpernumperwtagesexraceindiglitedattainedattaindempstatempstatdlabforce
99908471333brazil2010brazil 20101.184290e+0923.83households76025niu (not in universe)no23.8353femalebrown (brazil)nono, illiterateless than primary completedsome primary completedinactiveinactive (not in labor force)no, not in the labor force
99917453645brazil2010brazil 20108.984330e+0885.23households76023niu (not in universe)no65.2315femalebrown (brazil)noyes, literateprimary completedprimary (6 yrs) completedinactiveinactive (not in labor force)no, not in the labor force
99922248829argentina2010argentina 20107.217800e+08810.00households32018NaNno610.0011femaleNaNNaNyes, literateless than primary completedsome primary completedniu (not in universe)niu (not in universe)niu (not in universe)
99935912698brazil2010brazil 20104.900580e+0848.72households76021niu (not in universe)no38.728malewhitenoyes, literateless than primary completedsome primary completedniu (not in universe)niu (not in universe)niu (not in universe)
999442426758mexico2015mexico 20152.051680e+09110.00households484021nono110.0060maleNaNnoyes, literateuniversity completeduniversity completedinactiveretirees and living on rentno, not in the labor force
999526255366colombia2005colombia 20053.979800e+0794.64households170005NaNno44.6421malewhitenoyes, literateless than primary completedsome primary completedemployedat workyes, in the labor force
99964184185brazil2010brazil 20106.417800e+0758.83households76012niu (not in universe)no38.8391femalewhitenono, illiterateless than primary completedno schoolinginactiveinactive (not in labor force)no, not in the labor force
999730656672dominican republic2010dominican republic 20103.642400e+07210.00households214018noyes110.0028maleNaNNaNyes, literateprimary completedprimary (6 yrs) completedemployedemployed, not specifiedyes, in the labor force
99989051171brazil2010brazil 20101.349303e+0965.47households76026niu (not in universe)no25.4745femalewhitenono, illiterateless than primary completedsome primary completedinactiveinactive (not in labor force)no, not in the labor force
999946094153nicaragua2005nicaragua 20057.503100e+07510.00households558055nono110.0044femaleNaNnoyes, literateprimary completedlower secondary general completedemployedat workyes, in the labor force